1,089 research outputs found

    SLIM : Scalable Linkage of Mobility Data

    Get PDF
    We present a scalable solution to link entities across mobility datasets using their spatio-temporal information. This is a fundamental problem in many applications such as linking user identities for security, understanding privacy limitations of location based services, or producing a unified dataset from multiple sources for urban planning. Such integrated datasets are also essential for service providers to optimise their services and improve business intelligence. In this paper, we first propose a mobility based representation and similarity computation for entities. An efficient matching process is then developed to identify the final linked pairs, with an automated mechanism to decide when to stop the linkage. We scale the process with a locality-sensitive hashing (LSH) based approach that significantly reduces candidate pairs for matching. To realize the effectiveness and efficiency of our techniques in practice, we introduce an algorithm called SLIM. In the experimental evaluation, SLIM outperforms the two existing state-of-the-art approaches in terms of precision and recall. Moreover, the LSH-based approach brings two to four orders of magnitude speedup

    Learning Mixtures of Gaussians in High Dimensions

    Full text link
    Efficiently learning mixture of Gaussians is a fundamental problem in statistics and learning theory. Given samples coming from a random one out of k Gaussian distributions in Rn, the learning problem asks to estimate the means and the covariance matrices of these Gaussians. This learning problem arises in many areas ranging from the natural sciences to the social sciences, and has also found many machine learning applications. Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case. In this work, we show that provided we are in high enough dimensions, the class of Gaussian mixtures is learnable in its most general form under a smoothed analysis framework, where the parameters are randomly perturbed from an adversarial starting point. In particular, given samples from a mixture of Gaussians with randomly perturbed parameters, when n > {\Omega}(k^2), we give an algorithm that learns the parameters with polynomial running time and using polynomial number of samples. The central algorithmic ideas consist of new ways to decompose the moment tensor of the Gaussian mixture by exploiting its structural properties. The symmetries of this tensor are derived from the combinatorial structure of higher order moments of Gaussian distributions (sometimes referred to as Isserlis' theorem or Wick's theorem). We also develop new tools for bounding smallest singular values of structured random matrices, which could be useful in other smoothed analysis settings

    t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification

    Get PDF
    International audienceThe ASVspoof challenge series was born to spearhead research in anti-spoofing for automatic speaker verification (ASV). The two challenge editions in 2015 and 2017 involved the assessment of spoofing countermeasures (CMs) in isolation from ASV using an equal error rate (EER) metric. While a strategic approach to assessment at the time, it has certain shortcomings. First, the CM EER is not necessarily a reliable predic-tor of performance when ASV and CMs are combined. Second, the EER operating point is ill-suited to user authentication applications , e.g. telephone banking, characterised by a high target user prior but a low spoofing attack prior. We aim to migrate from CM-to ASV-centric assessment with the aid of a new tandem detection cost function (t-DCF) metric. It extends the conventional DCF used in ASV research to scenarios involving spoofing attacks. The t-DCF metric has 6 parameters: (i) false alarm and miss costs for both systems, and (ii) prior probabilities of target and spoof trials (with an implied third, nontar-get prior). The study is intended to serve as a self-contained, tutorial-like presentation. We analyse with the t-DCF a selection of top-performing CM submissions to the 2015 and 2017 editions of ASVspoof, with a focus on the spoofing attack prior. Whereas there is little to choose between countermeasure systems for lower priors, system rankings derived with the EER and t-DCF show differences for higher priors. We observe some ranking changes. Findings support the adoption of the DCF-based metric into the roadmap for future ASVspoof challenges, and possibly for other biometric anti-spoofing evaluations

    A New Spin on Galactic Dust

    Full text link
    We present a new puzzle involving Galactic microwave emission and attempt to resolve it. On one hand, a cross-correlation analysis of the WHAM H-alpha map with the Tenerife 10 and 15 GHz maps shows that the well-known DIRBE correlated microwave emission cannot be dominated by free-free emission. On the other hand, recent high resolution observations in the 8-10 GHz range with the Green Bank 140 ft telescope by Finkbeiner et al. failed to find the corresponding 8 sigma signal that would be expected in the simplest spinning dust models. So what physical mechanism is causing this ubiquitous dust-correlated emission? We argue for a model predicting that spinning dust is the culprit after all, but that the corresponding small grains are well correlated with the larger grains seen at 100 micron only on large angular scales. In support of this grain segregation model, we find the best spinning dust template to involve higher frequency maps in the range 12-60 micron, where emission from transiently heated small grains is important. Upcoming CMB experiments such as ground-based interferometers, MAP and Planck LFI with high resolution at low frequencies should allow a definitive test of this model.Comment: Minor revisions to match accepted ApJ version. 6 pages, 4 figs. Color figures and more foreground information at http://www.hep.upenn.edu/~angelica/foreground.html#spin or from [email protected]

    Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals

    Get PDF
    Recent years have seen growing efforts to develop spoofing countermeasures (CMs) to protect automatic speaker verification (ASV) systems from being deceived by manipulated or artificial inputs. The reliability of spoofing CMs is typically gauged using the equal error rate (EER) metric. The primitive EER fails to reflect application requirements and the impact of spoofing and CMs upon ASV and its use as a primary metric in traditional ASV research has long been abandoned in favour of risk-based approaches to assessment. This paper presents several new extensions to the tandem detection cost function (t-DCF), a recent risk-based approach to assess the reliability of spoofing CMs deployed in tandem with an ASV system. Extensions include a simplified version of the t-DCF with fewer parameters, an analysis of a special case for a fixed ASV system, simulations which give original insights into its interpretation and new analyses using the ASVspoof 2019 database. It is hoped that adoption of the t-DCF for the CM assessment will help to foster closer collaboration between the anti-spoofing and ASV research communities.Comment: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (doi updated

    FIRST-based survey of Compact Steep Spectrum sources, II. MERLIN and VLA observations of Medium-sized Symmetric Objects

    Full text link
    A new sample of candidate Compact Steep Spectrum (CSS) sources that are much weaker than the CSS source prototypes has been selected from the VLA FIRST catalogue. MERLIN `snapshot' observations of the sources at 5 GHz indicate that six of them have an FR II-like morphology, but are not edge-brightened as is normal for Medium-sized Symmetric Objects (MSOs) and FR IIs. Further observations of these six sources with the VLA at 4.9 GHz and MERLIN at 1.7 GHz, as well as subsequent full-track observations with MERLIN at 5 GHz of what appeared to be the two sources of greatest interest are presented. The results are discussed with reference to the established evolutionary model of CSS sources being young but in which not all of them evolve to become old objects with extended radio structures. A lack of stable fuelling in some of them may result in an early transition to a so-called coasting phase so that they fade away instead of growing to become large-scale objects. It is possible that one of the six sources (1542+323) could be labelled as a prematurely `dying' MSO or a `fader'.Comment: 13 pages, matches the version printed in Astronomy & Astrophysic

    Modeling the Dust Properties of z ~ 6 Quasars with ART^2 -- All-wavelength Radiative Transfer with Adaptive Refinement Tree

    Full text link
    The detection of large quantities of dust in z ~ 6 quasars by infrared and radio surveys presents puzzles for the formation and evolution of dust in these early systems. Previously (Li et al. 2007), we showed that luminous quasars at z > 6 can form through hierarchical mergers of gas-rich galaxies. Here, we calculate the dust properties of simulated quasars and their progenitors using a three-dimensional Monte Carlo radiative transfer code, ART^2 -- All-wavelength Radiative Transfer with Adaptive Refinement Tree. ART^2 incorporates a radiative equilibrium algorithm for dust emission, an adaptive grid for inhomogeneous density, a multiphase model for the ISM, and a supernova-origin dust model. We reproduce the SED and dust properties of SDSS J1148+5251, and find that the infrared emission are closely associated with the formation and evolution of the quasar host. The system evolves from a cold to a warm ULIRG owing to heating and feedback from stars and AGN. Furthermore, the AGN has significant implications for the interpretation of observation of the hosts. Our results suggest that vigorous star formation in merging progenitors is necessary to reproduce the observed dust properties of z~6 quasars, supporting a merger-driven origin for luminous quasars at high redshifts and the starburst-to-quasar evolutionary hypothesis. (Abridged)Comment: 26 pages, 22 figures, accepted by ApJ. Version with full resolution images is available at http://www.cfa.harvard.edu/~yxli/ARTDUST/astroph0706.3706.pd

    Drug Adverse Event Detection in Health Plan Data Using the Gamma Poisson Shrinker and Comparison to the Tree-based Scan Statistic

    Get PDF
    Background: Drug adverse event (AE) signal detection using the Gamma Poisson Shrinker (GPS) is commonly applied in spontaneous reporting. AE signal detection using large observational health plan databases can expand medication safety surveillance. Methods: Using data from nine health plans, we conducted a pilot study to evaluate the implementation and findings of the GPS approach for two antifungal drugs, terbinafine and itraconazole, and two diabetes drugs, pioglitazone and rosiglitazone. We evaluated 1676 diagnosis codes grouped into 183 different clinical concepts and four levels of granularity. Several signaling thresholds were assessed. GPS results were compared to findings from a companion study using the identical analytic dataset but an alternative statistical method—the tree-based scan statistic (TreeScan). Results: We identified 71 statistical signals across two signaling thresholds and two methods, including closely-related signals of overlapping diagnosis definitions. Initial review found that most signals represented known adverse drug reactions or confounding. About 31% of signals met the highest signaling threshold. Conclusions: The GPS method was successfully applied to observational health plan data in a distributed data environment as a drug safety data mining method. There was substantial concordance between the GPS and TreeScan approaches. Key method implementation decisions relate to defining exposures and outcomes and informed choice of signaling thresholds
    corecore